Phrase Identification in Cross-Language Information Retrieval
نویسندگان
چکیده
Term-sense ambiguity and the difficulty in translating phrases are the main sources of problem in dictionarybased cross-language information retrieval (CLIR) approaches. We propose a term similarity-based translationphrase identification technique to enhance the retrieval effectiveness of a dictionary-based query translation method. The technique identifies noun-phrases in the target language based on the degree of association between every pair of terms from two sets of translation terms. We demonstrate the effectiveness of the technique through a series of experiments using queries in two source languages, Spanish and Indonesian, to retrieve documents in English from the standard TREC (Text Retrieval Conference) collection. Combining this technique with our earlier term-similarity based sense disambiguation technique results in further retrieval performance improvements.
منابع مشابه
Semantic annotation for concept-based cross-language medical information retrieval
We present a framework for concept-based cross-language information retrieval in the medical domain, which is under development in the MUCHMORE project. Our approach is based on using the Unified Medical Language System (UMLS) as the primary source of semantic data. Documents and queries are annotated with multiple layers of linguistic information. Linguistic processing includes part-of-speech ...
متن کاملA survey on phrase structure learning methods for text classification
Text classification is a task of automatic classification of text into one of the predefined categories. The problem of text classification has been widely studied in different communities like natural language processing, data mining and information retrieval. Text classification is an important constituent in many information management tasks like topic identification, spam filtering, email r...
متن کاملImproving Query Translation for Cross-Language Information Retrieval using a Web-based Approach
With the increasing popularity of the Internet, research on Cross-Language Information Retrieval (CLIR) is being paid much attention. Existing improving approaches for query translation such as noun phrase (NP) identification, translation and words translation selection require special corpus resource. However, those natural language resources are not readily available. In this paper, we propos...
متن کاملWord Formation Approach to Noun Phrase Analysis for Thai
Noun phrase analysis is one of the most important components in Natural Language Processing (NLP) applications, such as information retrieval, extraction and categorization. For Thai, noun phrase analysis has unique problems, i.e., noun phrase boundary identification, noun phrase decomposition and its relation extraction, and core noun detection. Statistical and rule based Word formation is, th...
متن کاملCross-Lingual Medical Information Retrieval through Semantic Annotation
We present a framework for concept-based, cross-lingual information retrieval (CLIR) in the medical domain, which is under development in the MUCHMORE project. Our approach is based on using the Unified Medical Language System (UMLS) as the primary source of semantic data, whereby documents and queries are annotated with multiple layers of linguistic information. Linguistic processing includes ...
متن کامل